A Graph-Based Semi-Supervised k Nearest-Neighbor Method for Nonlinear Manifold Distributed Data Classification

نویسندگان

  • Enmei Tu
  • Yaqian Zhang
  • Lin Zhu
  • Jie Yang
  • Nikola K. Kasabov
چکیده

k Nearest Neighbors (kNN) is one of the most widely used supervised learning algorithms to classify Gaussian distributed data, but it does not achieve good results when it is applied to nonlinear manifold distributed data, especially when a very limited amount of labeled samples are available. In this paper, we propose a new graph-based kNN algorithm which can effectively handle both Gaussian distributed data and nonlinear manifold distributed data. To achieve this goal, we first propose a constrained Tired Random Walk (TRW) by constructing an R-level nearest-neighbor strengthened tree over the graph, and then compute a TRW matrix for similarity measurement purposes. After this, the nearest neighbors are identified according to the TRW matrix and the class label of a query point is determined by the sum of all the TRW weights of its nearest neighbors. To deal with online situations, we also propose a new algorithm to handle sequential samples based a local neighborhood reconstruction. Comparison experiments are conducted on both synthetic data sets and real-world data sets to demonstrate the validity of the proposed new kNN algorithm and its improvements to other version of kNN algorithms. Given the widespread appearance of manifold structures ∗Corresponding author: Enmei Tu, [email protected] Preprint submitted to Elsevier June 6, 2016 ar X iv :1 60 6. 00 98 5v 1 [ cs .L G ] 3 J un 2 01 6 in real-world problems and the popularity of the traditional kNN algorithm, the proposed manifold version kNN shows promising potential for classifying manifold-distributed data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using the Mutual k-Nearest Neighbor Graphs for Semi-supervised Classification on Natural Language Data

The first step in graph-based semi-supervised classification is to construct a graph from input data. While the k-nearest neighbor graphs have been the de facto standard method of graph construction, this paper advocates using the less well-known mutual k-nearest neighbor graphs for high-dimensional natural language data. To compare the performance of these two graph construction methods, we ru...

متن کامل

Un-Normalized Graph P-Laplacian Semi- Supervised Learning Method Applied to Cancer Classification Problem

A successful classification of different tumor types is essential for successful treatment of cancer. However, most prior cancer classification methods are clinical-based and have inadequate diagnostic ability. Cancer classification using gene expression data is very important in cancer diagnosis and drug discovery. The introduction of DNA microarray techniques has made simultaneous monitoring ...

متن کامل

A Comparison of Graph Construction and Learning Algorithms for Graph-Based Phonetic Classification

Graph-based semi-supervised learning (SSL) algorithms have been widely applied in large-scale machine learning. In this work, we show different graph-based SSL methods (modified adsorption, measure propagation, and prior-based measure propagation) and compare them to the standard label propagation algorithm on a phonetic classification task. In addition, we compare 4 different ways of construct...

متن کامل

Graph based semi-supervised human pose estimation: When the output space comes to help

In this letter, we introduce a semi-supervised manifold regularization framework for human pose estimation. We utilize the unlabeled data to compensate for the complexities in the input space and model the underlying manifold by a nearest neighbor graph. We argue that the optimal graph is a subgraph of the k nearest neighbors (k-NN) graph. Then, we estimate distances in the output space to appr...

متن کامل

Random Graphs for Structure Discovery in High-dimensional Data

Originally motivated by computational considerations, we demonstrate how computational efficient and scalable graph constructions can be used to encode both statistical and spatial information and address the problems of dimension reduction and structure discovery in high-dimensional data, with provable results. We discuss the asymptotic behavior of power weighted functionals of minimal Euclide...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Inf. Sci.

دوره 367-368  شماره 

صفحات  -

تاریخ انتشار 2016